by Kinjal Sanghvi, Maddhujeet Chandra, Kaivalya Powale at Duke University

The Problem Statement

We collected the survey data of passengers traveling from the Austin-Bergstrom International Airport. The data was obtained from the US Government website: data.gov. The data contained 37 features and 3501 survey responses. We intend to perform key driver analysis and understand which features affect customer’s overall satisfaction and what can the airport do to improve their service to the passengers.

Data Cleaning

Our data consists of 37 features and looks like this:

survey.df <- read.csv("~/OneDrive - Duke University/Coursework/590.21 Marketing Analytics/Project/Airport_Quarterly_Passenger_Survey.csv")
head(survey.df)
##   Quarter Date.recorded Departure.time
## 1    3Q16    09/04/2016          11:45
## 2    2Q16    05/01/2016          16:45
## 3    2Q16    04/07/2016          11:10
## 4    3Q16    09/02/2016          17:16
## 5    3Q16    08/04/2016           7:49
## 6    3Q16    08/02/2016           9:45
##   Ground.transportation.to.from.airport Parking.facilities
## 1                                     0                  0
## 2                                     0                  0
## 3                                     4                  4
## 4                                     0                  0
## 5                                     5                  0
## 6                                     5                  5
##   Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                                    0                             0
## 2                                    0                             0
## 3                                    4                             5
## 4                                    0                             0
## 5                                    0                             0
## 6                                    2                             0
##   Efficiency.of.check.in.staff Check.in.wait.time
## 1                            5                  0
## 2                            5                  0
## 3                            5                  5
## 4                            4                  0
## 5                            4                  4
## 6                            4                  5
##   Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                             0                                3
## 2                             0                                2
## 3                             5                               NA
## 4                             0                               NA
## 5                             4                                5
## 6                             5                                0
##   Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                            4                          4
## 2                            3                          3
## 3                           NA                          5
## 4                           NA                          4
## 5                            5                          2
## 6                            0                          5
##   Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                                   5                                2
## 2                                   0                                2
## 3                                   5                                5
## 4                                   4                                2
## 5                                   3                                2
## 6                                   5                                5
##   Feeling.of.safety.and.security
## 1                              4
## 2                              3
## 3                              5
## 4                              3
## 5                              3
## 6                              5
##   Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                            5                          5
## 2                                            5                          5
## 3                                            0                         NA
## 4                                            4                          4
## 5                                            4                          3
## 6                                            5                          5
##   Walking.distance.inside.terminal Ease.of.making.connections
## 1                                5                          0
## 2                                4                          0
## 3                                0                          0
## 4                                4                          0
## 5                                5                          0
## 6                                5                          0
##   Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1                         0           0                             0
## 2                         0           4                             3
## 3                         5           5                             5
## 4                         0           0                             2
## 5                         4           4                             4
## 6                         5           5                             5
##   Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                                        0                   0
## 2                                        0                   0
## 3                                        0                   5
## 4                                        0                   0
## 5                                        3                   4
## 6                                        0                   5
##   Shopping.facilities..value.for.money. Internet.access
## 1                                     0               0
## 2                                     0               4
## 3                                     0               0
## 4                                     0               0
## 5                                     3               2
## 6                                     5               5
##   Business.executive.lounges Availability.of.washrooms
## 1                          0                         4
## 2                          0                         0
## 3                          0                         5
## 4                          0                         4
## 5                          2                         4
## 6                          5                         5
##   Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1                        0                             4
## 2                        0                             4
## 3                        5                             5
## 4                        4                             4
## 5                        4                             2
## 6                        5                             5
##   Cleanliness.of.airport.terminal Ambience.of.airport
## 1                               5                   4
## 2                               4                   4
## 3                               5                   5
## 4                               4                   4
## 5                               5                   4
## 6                               5                   5
##   Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                                     4                         0
## 2                                     4                         0
## 3                                    NA                         0
## 4                                     4                         0
## 5                                     4                         0
## 6                                     5                         0
##   Customs.inspection Overall.satisfaction
## 1                  0                    0
## 2                  0                    0
## 3                  5                    0
## 4                  0                    0
## 5                  4                    0
## 6                  5                    0

In the Summary we noticed that the data has multiple NA values and the data needs to be cleaned. The exploratory data analysis also showed us that the date and time format of the airport was changed in 2016 and hence the “Departure.time” feature does not have the same format.

summary(survey.df)
##     Quarter        Date.recorded   Departure.time
##  2Q15   : 352   05/11/2017:  56   8:00    :  43  
##  1Q17   : 351   11/04/2016:  56   12:25 PM:  36  
##  4Q15   : 351   01/05/2017:  55   18:00   :  30  
##  1Q15   : 350   01/10/2017:  55   19:20   :  30  
##  1Q16   : 350   05/03/2017:  55   9:25    :  30  
##  2Q16   : 350   11/12/2016:  55   14:45   :  29  
##  (Other):1397   (Other)   :3169   (Other) :3303  
##  Ground.transportation.to.from.airport Parking.facilities
##  Min.   :0.000                         Min.   :0.00      
##  1st Qu.:0.000                         1st Qu.:0.00      
##  Median :2.000                         Median :0.00      
##  Mean   :2.191                         Mean   :1.13      
##  3rd Qu.:4.000                         3rd Qu.:3.00      
##  Max.   :5.000                         Max.   :5.00      
##  NA's   :54                            NA's   :39        
##  Parking.facilities..value.for.money. Availability.of.baggage.carts
##  Min.   :0.000                        Min.   :0.000                
##  1st Qu.:0.000                        1st Qu.:0.000                
##  Median :0.000                        Median :0.000                
##  Mean   :1.017                        Mean   :1.036                
##  3rd Qu.:2.000                        3rd Qu.:2.000                
##  Max.   :5.000                        Max.   :5.000                
##  NA's   :46                           NA's   :91                   
##  Efficiency.of.check.in.staff Check.in.wait.time
##  Min.   :0.000                Min.   :0.000     
##  1st Qu.:3.000                1st Qu.:3.000     
##  Median :5.000                Median :5.000     
##  Mean   :3.778                Mean   :3.789     
##  3rd Qu.:5.000                3rd Qu.:5.000     
##  Max.   :5.000                Max.   :5.000     
##  NA's   :38                   NA's   :39        
##  Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
##  Min.   :0.000                 Min.   :0.000                   
##  1st Qu.:3.000                 1st Qu.:2.000                   
##  Median :5.000                 Median :4.000                   
##  Mean   :3.778                 Mean   :3.347                   
##  3rd Qu.:5.000                 3rd Qu.:5.000                   
##  Max.   :5.000                 Max.   :5.000                   
##  NA's   :52                    NA's   :69                      
##  Courtesy.of.inspection.staff Courtesy.of.security.staff
##  Min.   :0.000                Min.   :0.000             
##  1st Qu.:3.000                1st Qu.:4.000             
##  Median :4.000                Median :4.000             
##  Mean   :3.456                Mean   :3.962             
##  3rd Qu.:5.000                3rd Qu.:5.000             
##  Max.   :5.000                Max.   :5.000             
##  NA's   :96                   NA's   :31                
##  Thoroughness.of.security.inspection Wait.time.of.security.inspection
##  Min.   :0.000                       Min.   :0.000                   
##  1st Qu.:4.000                       1st Qu.:3.000                   
##  Median :4.000                       Median :4.000                   
##  Mean   :4.082                       Mean   :4.019                   
##  3rd Qu.:5.000                       3rd Qu.:5.000                   
##  Max.   :5.000                       Max.   :5.000                   
##  NA's   :46                          NA's   :50                      
##  Feeling.of.safety.and.security
##  Min.   :0.000                 
##  1st Qu.:4.000                 
##  Median :5.000                 
##  Mean   :4.192                 
##  3rd Qu.:5.000                 
##  Max.   :5.000                 
##  NA's   :43                    
##  Ease.of.finding.your.way.through.the.airport Flight.information.screens
##  Min.   :0.000                                Min.   :0.000             
##  1st Qu.:4.000                                1st Qu.:4.000             
##  Median :5.000                                Median :5.000             
##  Mean   :4.506                                Mean   :4.229             
##  3rd Qu.:5.000                                3rd Qu.:5.000             
##  Max.   :5.000                                Max.   :5.000             
##  NA's   :36                                   NA's   :26                
##  Walking.distance.inside.terminal Ease.of.making.connections
##  Min.   :0.000                    Min.   :0.0000            
##  1st Qu.:4.000                    1st Qu.:0.0000            
##  Median :5.000                    Median :0.0000            
##  Mean   :4.397                    Mean   :0.3602            
##  3rd Qu.:5.000                    3rd Qu.:0.0000            
##  Max.   :5.000                    Max.   :5.0000            
##  NA's   :37                       NA's   :83                
##  Courtesy.of.airport.staff  Restaurants    Restaurants..value.for.money.
##  Min.   :0.00              Min.   :0.000   Min.   :0.000                
##  1st Qu.:3.00              1st Qu.:0.000   1st Qu.:0.000                
##  Median :4.00              Median :4.000   Median :3.000                
##  Mean   :3.59              Mean   :2.969   Mean   :2.548                
##  3rd Qu.:5.00              3rd Qu.:5.000   3rd Qu.:4.000                
##  Max.   :5.00              Max.   :5.000   Max.   :5.000                
##  NA's   :40                NA's   :59      NA's   :60                   
##  Availability.of.banks.ATM.money.changing Shopping.facilities
##  Min.   :0.0000                           Min.   :0.000      
##  1st Qu.:0.0000                           1st Qu.:0.000      
##  Median :0.0000                           Median :0.000      
##  Mean   :0.8991                           Mean   :1.885      
##  3rd Qu.:0.0000                           3rd Qu.:4.000      
##  Max.   :5.0000                           Max.   :5.000      
##  NA's   :41                               NA's   :46         
##  Shopping.facilities..value.for.money. Internet.access
##  Min.   :0.000                         Min.   :0.000  
##  1st Qu.:0.000                         1st Qu.:0.000  
##  Median :0.000                         Median :1.000  
##  Mean   :1.538                         Mean   :1.901  
##  3rd Qu.:3.000                         3rd Qu.:4.000  
##  Max.   :5.000                         Max.   :5.000  
##  NA's   :57                            NA's   :73     
##  Business.executive.lounges Availability.of.washrooms
##  Min.   :0.0000             Min.   :0.000            
##  1st Qu.:0.0000             1st Qu.:4.000            
##  Median :0.0000             Median :4.000            
##  Mean   :0.4842             Mean   :3.908            
##  3rd Qu.:0.0000             3rd Qu.:5.000            
##  Max.   :5.0000             Max.   :5.000            
##  NA's   :91                 NA's   :35               
##  Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
##  Min.   :0.000            Min.   :0.000                
##  1st Qu.:3.000            1st Qu.:3.000                
##  Median :4.000            Median :4.000                
##  Mean   :3.801            Mean   :4.003                
##  3rd Qu.:5.000            3rd Qu.:5.000                
##  Max.   :5.000            Max.   :5.000                
##  NA's   :37               NA's   :41                   
##  Cleanliness.of.airport.terminal Ambience.of.airport
##  Min.   :0.000                   Min.   :0.000      
##  1st Qu.:4.000                   1st Qu.:4.000      
##  Median :5.000                   Median :4.000      
##  Mean   :4.377                   Mean   :4.232      
##  3rd Qu.:5.000                   3rd Qu.:5.000      
##  Max.   :5.000                   Max.   :5.000      
##  NA's   :32                      NA's   :54         
##  Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
##  Min.   :0.000                         Min.   :0.0000           
##  1st Qu.:0.000                         1st Qu.:0.0000           
##  Median :4.000                         Median :0.0000           
##  Mean   :2.644                         Mean   :0.9181           
##  3rd Qu.:5.000                         3rd Qu.:0.0000           
##  Max.   :5.000                         Max.   :5.0000           
##  NA's   :143                           NA's   :181              
##  Customs.inspection Overall.satisfaction
##  Min.   :0.000      Min.   :0.000       
##  1st Qu.:0.000      1st Qu.:0.000       
##  Median :0.000      Median :0.000       
##  Mean   :1.343      Mean   :1.826       
##  3rd Qu.:3.000      3rd Qu.:4.000       
##  Max.   :5.000      Max.   :5.000       
##  NA's   :201        NA's   :172

We went ahead and tried two different methods in order to see the difference between the two of them and would it add a bias after imputing values for the NA’s:

1. Impute Data using K-NN Imputation

We decided to impute data in rows which have less that 6 missing values, if a passenger has not answered more than 6 unanswered we decided to omit that data because then that would not include the true customer data and effect our model. Using this, our data then reduced from 3501 to 3434 survey responses.

length(unique (unlist (lapply (survey.df, function (x) which(is.na(x))))))
## [1] 969
969/3501
## [1] 0.2767781
df <- vector()
a <- vector()
df <- as.integer(apply(survey.df, 1, function(x) sum(is.na(x))))
for(i in 1:37){
  r <- sum(df==i)
  a[i] <- r
}
print(a)
##  [1] 509 162 140  64  27  14  12   9   9   7   8   0   3   3   0   0   1
## [18]   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
## [35]   0   0   0
length(unique (unlist (lapply (survey.df, function (x) which(is.na(x))))))
## [1] 969

We have 969 rows with NA values

survey.df$missing <- as.integer(apply(survey.df, 1, function(x) sum(is.na(x))))
survey_temp = survey.df[!survey.df$missing > 5, 1:37]
dim(survey_temp)
## [1] 3434   37
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## Loading required package: data.table
## VIM is ready to use. 
##  Since version 4.0.0 the GUI is in its own package VIMGUI.
## 
##           Please use the package to use the new (and old) GUI.
## Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
impu <- kNN(survey_temp, variable = colnames(survey_temp), metric = NULL, k = 5)
dim(survey_imputed)
## [1] 3434   37

This is the dimensions of the dataset after the imputation. We deleted 67 rows and imputed the rest.

2. Omitting data

After omiting the data the dimensions of the dataset are mentioned below.

survey_omit <- na.omit(survey.df)
survey_omit$missing <- NULL
dim(survey_omit)
## [1] 2532   37

We have used both the datasets to run various tests to conduct key driver analysis.

levels(survey.df$Departure.time)
##   [1] "1:00 PM"  "1:05 PM"  "1:09 PM"  "1:10 PM"  "1:15 PM"  "1:18 PM" 
##   [7] "1:20 AM"  "1:35 PM"  "1:42 PM"  "1:45 PM"  "1:49 PM"  "1:50 PM" 
##  [13] "1:53 PM"  "10:00"    "10:00 AM" "10:01 AM" "10:04"    "10:05"   
##  [19] "10:05 AM" "10:09"    "10:10"    "10:10 AM" "10:15"    "10:15 AM"
##  [25] "10:20"    "10:20 AM" "10:25"    "10:30"    "10:30 AM" "10:34 AM"
##  [31] "10:35"    "10:35 AM" "10:38"    "10:40"    "10:40 AM" "10:42"   
##  [37] "10:43 AM" "10:44 AM" "10:45"    "10:45 AM" "10:48"    "10:50"   
##  [43] "10:50 AM" "10:54"    "10:55"    "10:55 AM" "10:59"    "11:00"   
##  [49] "11:00 AM" "11:02"    "11:05"    "11:05 AM" "11:10"    "11:10 AM"
##  [55] "11:13 AM" "11:15"    "11:15 AM" "11:20"    "11:21"    "11:25"   
##  [61] "11:25 AM" "11:30 AM" "11:34"    "11:35"    "11:35 AM" "11:40"   
##  [67] "11:40 AM" "11:41 AM" "11:44 AM" "11:45"    "11:45 AM" "11:46"   
##  [73] "11:47"    "11:50"    "11:52"    "11:53"    "11:55"    "11:55 AM"
##  [79] "11:56 AM" "11:59"    "12:00"    "12:00 PM" "12:01 PM" "12:05"   
##  [85] "12:05 PM" "12:07"    "12:10 PM" "12:12 PM" "12:14"    "12:15"   
##  [91] "12:15 PM" "12:20"    "12:20 PM" "12:24"    "12:25 PM" "12:27"   
##  [97] "12:35"    "12:35 PM" "12:39 PM" "12:40"    "12:40 PM" "12:42 PM"
## [103] "12:45"    "12:45 PM" "12:48 PM" "12:50"    "12:50 PM" "12:52 PM"
## [109] "12:54"    "12:55"    "12:55 PM" "12:56"    "12:57"    "13:00"   
## [115] "13:05"    "13:06"    "13:10"    "13:15"    "13:20"    "13:25"   
## [121] "13:30"    "13:32"    "13:35"    "13:37"    "13:40"    "13:45"   
## [127] "13:46"    "13:50"    "14:00"    "14:05"    "14:10"    "14:11"   
## [133] "14:13"    "14:15"    "14:18"    "14:23"    "14:25"    "14:26"   
## [139] "14:31"    "14:35"    "14:40"    "14:45"    "14:47"    "14:50"   
## [145] "14:55"    "14:59"    "15:05"    "15:10"    "15:15"    "15:20"   
## [151] "15:22"    "15:25"    "15:29"    "15:30"    "15:32"    "15:33"   
## [157] "15:35"    "15:50"    "15:51"    "16:00"    "16:05"    "16:08"   
## [163] "16:10"    "16:15"    "16:22"    "16:24"    "16:25"    "16:28"   
## [169] "16:29"    "16:30"    "16:35"    "16:40"    "16:41"    "16:45"   
## [175] "16:49"    "16:53"    "16:55"    "17:00"    "17:03"    "17:04"   
## [181] "17:05"    "17:10"    "17:15"    "17:16"    "17:20"    "17:23"   
## [187] "17:25"    "17:26"    "17:30"    "17:35"    "17:40"    "17:43"   
## [193] "17:45"    "17:55"    "17:59"    "18:00"    "18:05"    "18:10"   
## [199] "18:15"    "18:20"    "18:25"    "18:26"    "18:30"    "18:35"   
## [205] "18:40"    "18:45"    "18:47"    "18:50"    "18:55"    "18:56"   
## [211] "19:00"    "19:11"    "19:15"    "19:16"    "19:20"    "19:23"   
## [217] "19:30"    "19:35"    "19:39"    "19:40"    "19:42"    "19:45"   
## [223] "19:50"    "19:55"    "2:07 PM"  "2:15 PM"  "2:20 PM"  "2:40 PM" 
## [229] "2:45 PM"  "2:55 PM"  "20:00"    "20:05"    "20:06"    "20:10"   
## [235] "20:35"    "20:50"    "21:05"    "21:35"    "21:51"    "3:00 PM" 
## [241] "3:05 PM"  "3:12 PM"  "3:15 PM"  "3:30 PM"  "3:35 PM"  "3:50 PM" 
## [247] "4:05 PM"  "4:35 PM"  "4:40 PM"  "5:05 PM"  "5:10 PM"  "5:20 PM" 
## [253] "5:25 PM"  "5:35 PM"  "5:55 PM"  "6:10"     "6:20 PM"  "6:25 AM" 
## [259] "6:30"     "6:35 AM"  "6:35 PM"  "6:36"     "6:38"     "6:40 PM" 
## [265] "6:45 PM"  "6:50"     "6:50 AM"  "6:50 PM"  "6:54"     "6:55 PM" 
## [271] "6:57 PM"  "7:00"     "7:00 PM"  "7:15"     "7:20"     "7:20 PM" 
## [277] "7:29"     "7:30"     "7:30 PM"  "7:35 PM"  "7:40"     "7:45"    
## [283] "7:45 AM"  "7:49"     "7:50 AM"  "8:00"     "8:05"     "8:05 AM" 
## [289] "8:10"     "8:10 AM"  "8:10 PM"  "8:15"     "8:25"     "8:26"    
## [295] "8:30"     "8:34 AM"  "8:35"     "8:35 AM"  "8:39"     "8:40"    
## [301] "8:40 PM"  "8:45"     "8:45 AM"  "8:45 PM"  "8:46 PM"  "8:50"    
## [307] "8:50 AM"  "8:53"     "8:54"     "8:55"     "8:55 PM"  "9:04"    
## [313] "9:05"     "9:10"     "9:10 AM"  "9:16"     "9:20"     "9:20 AM" 
## [319] "9:25"     "9:25 PM"  "9:30"     "9:30 AM"  "9:35"     "9:40"    
## [325] "9:45"     "9:50"     "9:51 AM"  "9:51 PM"  "9:55"     "9:55 PM" 
## [331] "9:57 PM"

The second step in cleaning data was to format the Departure.time feature. Since there are two kinds of formats in the survey data (both 12 hour and 24 hour) we decided to create bins of the time of the day using regular expressions.

We binned the “Departure.time” into Early Morning, Morning, Day, Evening and Night. + 01.00 - 07.59 is Early Morning + 08.00 - 11.59 is Morning + 12.00 - 16.59 is Day + 17.00 - 19.59 is Evening + 20.00 - 00.59 is Night

library(stringr)
survey_imputed$Departure.time <- as.character(survey_imputed$Departure.time)
for (i in 1:nrow(survey_imputed)) {
if (str_detect(survey_imputed$Departure.time[i], regex(".am", ignore_case = TRUE)))
  {
    if(str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
    else if(str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
    else {survey_imputed$Departure.time[i] <- NA}
  } else if (str_detect(survey_imputed$Departure.time[i], regex(".pm", ignore_case = TRUE)))
  {
  if(str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
  else if(str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
  } else if (str_detect(survey_imputed$Departure.time[i], regex("^00:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
  } else if (str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  } else if (str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  } else if (str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  } else if (str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^13:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^14:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^15:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^16:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^17:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^18:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^19:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^20:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^21:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^22:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
  }else if (str_detect(survey_imputed$Departure.time[i], regex("^23:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
  }else {survey_imputed$Departure.time[i] <- NA}
}
survey_omit$Departure.time.char <- as.character(survey_omit$Departure.time)
survey_omit$Departure.time.bin <- survey_omit$Departure.time.char
for (i in 1:nrow(survey_omit)) {
if (str_detect(survey_omit$Departure.time.char[i], regex(".am", ignore_case = TRUE)))
  {
    if(str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
    else if(str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
    else {survey_omit$Departure.time.bin[i] <- NA}
  } else if (str_detect(survey_omit$Departure.time.char[i], regex(".pm", ignore_case = TRUE)))
  {
  if(str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
  else if(str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
  } else if (str_detect(survey_omit$Departure.time.char[i], regex("^00:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
  } else if (str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  } else if (str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  } else if (str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  } else if (str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^13:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^14:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^15:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^16:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^17:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^18:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^19:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^20:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^21:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^22:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
  }else if (str_detect(survey_omit$Departure.time.char[i], regex("^23:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
  }else {survey_omit$Departure.time.bin[i] <- NA}
}

survey_omit$Departure.time <- survey_omit$Departure.time.bin
survey_omit$Departure.time.char <- NULL
survey_omit$Departure.time.bin <- NULL
survey_omit$Departure.time <- as.factor(survey_omit$Departure.time)
head(survey_imputed)
##   Quarter Date.recorded Departure.time
## 1    3Q16    09/04/2016        Morning
## 2    2Q16    05/01/2016            Day
## 3    2Q16    04/07/2016        Morning
## 4    3Q16    09/02/2016        Evening
## 5    3Q16    08/04/2016  Early Morning
## 6    3Q16    08/02/2016        Morning
##   Ground.transportation.to.from.airport Parking.facilities
## 1                                     0                  0
## 2                                     0                  0
## 3                                     4                  4
## 4                                     0                  0
## 5                                     5                  0
## 6                                     5                  5
##   Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                                    0                             0
## 2                                    0                             0
## 3                                    4                             5
## 4                                    0                             0
## 5                                    0                             0
## 6                                    2                             0
##   Efficiency.of.check.in.staff Check.in.wait.time
## 1                            5                  0
## 2                            5                  0
## 3                            5                  5
## 4                            4                  0
## 5                            4                  4
## 6                            4                  5
##   Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                             0                                3
## 2                             0                                2
## 3                             5                                5
## 4                             0                                3
## 5                             4                                5
## 6                             5                                0
##   Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                            4                          4
## 2                            3                          3
## 3                            5                          5
## 4                            3                          4
## 5                            5                          2
## 6                            0                          5
##   Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                                   5                                2
## 2                                   0                                2
## 3                                   5                                5
## 4                                   4                                2
## 5                                   3                                2
## 6                                   5                                5
##   Feeling.of.safety.and.security
## 1                              4
## 2                              3
## 3                              5
## 4                              3
## 5                              3
## 6                              5
##   Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                            5                          5
## 2                                            5                          5
## 3                                            0                          5
## 4                                            4                          4
## 5                                            4                          3
## 6                                            5                          5
##   Walking.distance.inside.terminal Ease.of.making.connections
## 1                                5                          0
## 2                                4                          0
## 3                                0                          0
## 4                                4                          0
## 5                                5                          0
## 6                                5                          0
##   Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1                         0           0                             0
## 2                         0           4                             3
## 3                         5           5                             5
## 4                         0           0                             2
## 5                         4           4                             4
## 6                         5           5                             5
##   Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                                        0                   0
## 2                                        0                   0
## 3                                        0                   5
## 4                                        0                   0
## 5                                        3                   4
## 6                                        0                   5
##   Shopping.facilities..value.for.money. Internet.access
## 1                                     0               0
## 2                                     0               4
## 3                                     0               0
## 4                                     0               0
## 5                                     3               2
## 6                                     5               5
##   Business.executive.lounges Availability.of.washrooms
## 1                          0                         4
## 2                          0                         0
## 3                          0                         5
## 4                          0                         4
## 5                          2                         4
## 6                          5                         5
##   Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1                        0                             4
## 2                        0                             4
## 3                        5                             5
## 4                        4                             4
## 5                        4                             2
## 6                        5                             5
##   Cleanliness.of.airport.terminal Ambience.of.airport
## 1                               5                   4
## 2                               4                   4
## 3                               5                   5
## 4                               4                   4
## 5                               5                   4
## 6                               5                   5
##   Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                                     4                         0
## 2                                     4                         0
## 3                                     5                         0
## 4                                     4                         0
## 5                                     4                         0
## 6                                     5                         0
##   Customs.inspection Overall.satisfaction
## 1                  0                    0
## 2                  0                    0
## 3                  5                    0
## 4                  0                    0
## 5                  4                    0
## 6                  5                    0
head(survey_omit)
##    Quarter Date.recorded Departure.time
## 1     3Q16    09/04/2016        Morning
## 2     2Q16    05/01/2016            Day
## 5     3Q16    08/04/2016  Early Morning
## 6     3Q16    08/02/2016        Morning
## 7     2Q16    05/06/2016        Evening
## 13    3Q16    07/11/2016        Evening
##    Ground.transportation.to.from.airport Parking.facilities
## 1                                      0                  0
## 2                                      0                  0
## 5                                      5                  0
## 6                                      5                  5
## 7                                      2                  3
## 13                                     5                  0
##    Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                                     0                             0
## 2                                     0                             0
## 5                                     0                             0
## 6                                     2                             0
## 7                                     3                             3
## 13                                    0                             0
##    Efficiency.of.check.in.staff Check.in.wait.time
## 1                             5                  0
## 2                             5                  0
## 5                             4                  4
## 6                             4                  5
## 7                             5                  5
## 13                            0                  0
##    Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                              0                                3
## 2                              0                                2
## 5                              4                                5
## 6                              5                                0
## 7                              5                                5
## 13                             0                                1
##    Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                             4                          4
## 2                             3                          3
## 5                             5                          2
## 6                             0                          5
## 7                             5                          4
## 13                            1                          5
##    Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                                    5                                2
## 2                                    0                                2
## 5                                    3                                2
## 6                                    5                                5
## 7                                    4                                4
## 13                                   5                                4
##    Feeling.of.safety.and.security
## 1                               4
## 2                               3
## 5                               3
## 6                               5
## 7                               3
## 13                              0
##    Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                             5                          5
## 2                                             5                          5
## 5                                             4                          3
## 6                                             5                          5
## 7                                             5                          5
## 13                                            5                          5
##    Walking.distance.inside.terminal Ease.of.making.connections
## 1                                 5                          0
## 2                                 4                          0
## 5                                 5                          0
## 6                                 5                          0
## 7                                 4                          0
## 13                                3                          0
##    Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1                          0           0                             0
## 2                          0           4                             3
## 5                          4           4                             4
## 6                          5           5                             5
## 7                          4           4                             3
## 13                         0           0                             0
##    Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                                         0                   0
## 2                                         0                   0
## 5                                         3                   4
## 6                                         0                   5
## 7                                         0                   0
## 13                                        0                   0
##    Shopping.facilities..value.for.money. Internet.access
## 1                                      0               0
## 2                                      0               4
## 5                                      3               2
## 6                                      5               5
## 7                                      0               0
## 13                                     0               0
##    Business.executive.lounges Availability.of.washrooms
## 1                           0                         4
## 2                           0                         0
## 5                           2                         4
## 6                           5                         5
## 7                           0                         4
## 13                          0                         0
##    Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1                         0                             4
## 2                         0                             4
## 5                         4                             2
## 6                         5                             5
## 7                         4                             4
## 13                        0                             4
##    Cleanliness.of.airport.terminal Ambience.of.airport
## 1                                5                   4
## 2                                4                   4
## 5                                5                   4
## 6                                5                   5
## 7                                4                   4
## 13                               5                   4
##    Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                                      4                         0
## 2                                      4                         0
## 5                                      4                         0
## 6                                      5                         0
## 7                                      4                         0
## 13                                     3                         0
##    Customs.inspection Overall.satisfaction
## 1                   0                    0
## 2                   0                    0
## 5                   4                    0
## 6                   5                    0
## 7                   4                    0
## 13                  5                    0

Exploratory Data Analysis

The correlation plot for both the datasets were similar and looked like the one given below.

library(corrplot)
## corrplot 0.84 loaded
corrplot(cor(survey_imputed[,4:36]), method = 'color', tl.cex = 0.3) #Imputed

corrplot(cor(survey_omit[,4:36]), method = "color", tl.cex = 0.3) #Omitted

Checking our target variable

summary(as.factor(survey_imputed$Overall.satisfaction))
##    0    1    2    3    4    5 
## 2036    1   10  134  546  707
summary(as.factor(survey_omit$Overall.satisfaction))
##    0    1    2    3    4    5 
## 1508    1    5   81  408  529
#survey_omit[(survey_omit$Overall.satisfaction == 0),]

As we can see the data is skewed towards zero. While further investigating the survey responses which had overall satisfaction “0”, we came across many instaces where the respondents had given satisfaction ratings as “4” and “5” to individual services. This could mean multiple things, first being the respondents didn’t fill in the overall satisfaction. Second being the overall satisfaction being actaully “0”.We are assuming that the overall satisfaction filled by the respondents is “0” and will futhre analyze this data based on this assumption.

df2 <- na.omit(survey.df)
library(stringr)
df2$Departure.time.char <- as.character(df2$Departure.time)
for (i in 1:nrow(df2)) {
  if (str_detect(df2$Departure.time.char[i], regex(".am", ignore_case = TRUE)))
  {
    df2$Quarter.new[i] <- as.character(df2$Quarter[i])
  } else if (str_detect(df2$Departure.time.char[i], regex(".pm", ignore_case = TRUE)))
  {
    df2$Quarter.new[i] <- as.character(df2$Quarter[i])
  } else {df2$Quarter.new <- NA}
}
df2$Quarter.new <- as.factor(df2$Quarter.new)
summary(df2$Quarter.new)
## 1Q17 2Q17 4Q16 NA's 
##  266  237  252 1777

From the following result we can say that there was a change in time format, after the 3rd quater of 2016 while recording the responses

Plotting Distribution of each services present at the Airport

for (i in 4:36){
  hist(survey_omit[,i], xlab = "Satisfaction Levels", main = names(survey_omit[i]))
}

plot(aggregate(survey_omit$Overall.satisfaction ~ survey_omit$Quarter, data=survey_omit, mean))

plot(aggregate(survey_imputed$Overall.satisfaction ~ survey_imputed$Quarter, data=survey_imputed, mean)) 

The Overall satisfaction aggregated by Quarters shows that for some Qs, the overall sat is 0 while for others it is almost 5. 2Q17, 1Q17, 4Q16, and 1Q15 have average overall satisfaction as 5 while other quarters have average overall satisfaction of 0

This shows that the overall satisfaction is highly skewed when it comes to Quarters. This could be because of seasonal repairs or holiday seasons. But we do not have any data to support these claims. One high possibility for this could also be due to system defaulting unfilled data to 0s. But the occurrence of NAs in these Quarters hints otherwise. So, due to the inconclusive nature of this data, we decided to drop this as a predictor for the Overall Satisafaction. Including this as a predictor will cause the model to levy high and undue importance to this feature.

PCA

head(survey_imputed)
##   Quarter Date.recorded Departure.time
## 1    3Q16    09/04/2016        Morning
## 2    2Q16    05/01/2016            Day
## 3    2Q16    04/07/2016        Morning
## 4    3Q16    09/02/2016        Evening
## 5    3Q16    08/04/2016  Early Morning
## 6    3Q16    08/02/2016        Morning
##   Ground.transportation.to.from.airport Parking.facilities
## 1                                     0                  0
## 2                                     0                  0
## 3                                     4                  4
## 4                                     0                  0
## 5                                     5                  0
## 6                                     5                  5
##   Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                                    0                             0
## 2                                    0                             0
## 3                                    4                             5
## 4                                    0                             0
## 5                                    0                             0
## 6                                    2                             0
##   Efficiency.of.check.in.staff Check.in.wait.time
## 1                            5                  0
## 2                            5                  0
## 3                            5                  5
## 4                            4                  0
## 5                            4                  4
## 6                            4                  5
##   Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                             0                                3
## 2                             0                                2
## 3                             5                                5
## 4                             0                                3
## 5                             4                                5
## 6                             5                                0
##   Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                            4                          4
## 2                            3                          3
## 3                            5                          5
## 4                            3                          4
## 5                            5                          2
## 6                            0                          5
##   Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                                   5                                2
## 2                                   0                                2
## 3                                   5                                5
## 4                                   4                                2
## 5                                   3                                2
## 6                                   5                                5
##   Feeling.of.safety.and.security
## 1                              4
## 2                              3
## 3                              5
## 4                              3
## 5                              3
## 6                              5
##   Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                            5                          5
## 2                                            5                          5
## 3                                            0                          5
## 4                                            4                          4
## 5                                            4                          3
## 6                                            5                          5
##   Walking.distance.inside.terminal Ease.of.making.connections
## 1                                5                          0
## 2                                4                          0
## 3                                0                          0
## 4                                4                          0
## 5                                5                          0
## 6                                5                          0
##   Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1                         0           0                             0
## 2                         0           4                             3
## 3                         5           5                             5
## 4                         0           0                             2
## 5                         4           4                             4
## 6                         5           5                             5
##   Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                                        0                   0
## 2                                        0                   0
## 3                                        0                   5
## 4                                        0                   0
## 5                                        3                   4
## 6                                        0                   5
##   Shopping.facilities..value.for.money. Internet.access
## 1                                     0               0
## 2                                     0               4
## 3                                     0               0
## 4                                     0               0
## 5                                     3               2
## 6                                     5               5
##   Business.executive.lounges Availability.of.washrooms
## 1                          0                         4
## 2                          0                         0
## 3                          0                         5
## 4                          0                         4
## 5                          2                         4
## 6                          5                         5
##   Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1                        0                             4
## 2                        0                             4
## 3                        5                             5
## 4                        4                             4
## 5                        4                             2
## 6                        5                             5
##   Cleanliness.of.airport.terminal Ambience.of.airport
## 1                               5                   4
## 2                               4                   4
## 3                               5                   5
## 4                               4                   4
## 5                               5                   4
## 6                               5                   5
##   Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                                     4                         0
## 2                                     4                         0
## 3                                     5                         0
## 4                                     4                         0
## 5                                     4                         0
## 6                                     5                         0
##   Customs.inspection Overall.satisfaction
## 1                  0                    0
## 2                  0                    0
## 3                  5                    0
## 4                  0                    0
## 5                  4                    0
## 6                  5                    0
survey_imputed$Departure.time <- as.factor(survey_imputed$Departure.time)

sur_sc <- data.frame(scale(survey_imputed[,4:36], center = TRUE, scale = TRUE))

sur_sc$Overall.satisfaction = survey_imputed$Overall.satisfaction
print("MEAN")
## [1] "MEAN"
apply(sur_sc[1:33],2,mean)
##        Ground.transportation.to.from.airport 
##                                 6.775815e-17 
##                           Parking.facilities 
##                                -8.442936e-17 
##         Parking.facilities..value.for.money. 
##                                 4.408046e-17 
##                Availability.of.baggage.carts 
##                                 7.381137e-17 
##                 Efficiency.of.check.in.staff 
##                                 1.734218e-18 
##                           Check.in.wait.time 
##                                 9.137336e-17 
##                Courtesy.of.of.check.in.staff 
##                                -3.485663e-17 
##             Wait.time.at.passport.inspection 
##                                 3.019399e-17 
##                 Courtesy.of.inspection.staff 
##                                 8.707463e-17 
##                   Courtesy.of.security.staff 
##                                 1.100621e-16 
##          Thoroughness.of.security.inspection 
##                                 1.011253e-16 
##             Wait.time.of.security.inspection 
##                                -1.846987e-16 
##               Feeling.of.safety.and.security 
##                                -5.348140e-17 
## Ease.of.finding.your.way.through.the.airport 
##                                -2.956977e-17 
##                   Flight.information.screens 
##                                -2.732491e-16 
##             Walking.distance.inside.terminal 
##                                -1.681233e-16 
##                   Ease.of.making.connections 
##                                 3.766217e-17 
##                    Courtesy.of.airport.staff 
##                                -1.985491e-17 
##                                  Restaurants 
##                                -7.437384e-17 
##                Restaurants..value.for.money. 
##                                -1.090117e-16 
##     Availability.of.banks.ATM.money.changing 
##                                 1.423198e-17 
##                          Shopping.facilities 
##                                 2.392596e-17 
##        Shopping.facilities..value.for.money. 
##                                -1.481193e-17 
##                              Internet.access 
##                                 2.330427e-17 
##                   Business.executive.lounges 
##                                 3.975873e-17 
##                    Availability.of.washrooms 
##                                -9.299667e-17 
##                     Cleanliness.of.washrooms 
##                                -9.809902e-17 
##                Comfort.of.waiting.gate.areas 
##                                 4.206730e-19 
##              Cleanliness.of.airport.terminal 
##                                -4.046136e-16 
##                          Ambience.of.airport 
##                                -1.120128e-16 
##        Arrivals.passport.and.visa.inspection 
##                                 4.037103e-17 
##                    Speed.of.baggage.delivery 
##                                 1.039331e-17 
##                           Customs.inspection 
##                                 2.466383e-17
print("STANDARD DEVIATION")
## [1] "STANDARD DEVIATION"
apply(sur_sc[1:33],2,sd)
##        Ground.transportation.to.from.airport 
##                                            1 
##                           Parking.facilities 
##                                            1 
##         Parking.facilities..value.for.money. 
##                                            1 
##                Availability.of.baggage.carts 
##                                            1 
##                 Efficiency.of.check.in.staff 
##                                            1 
##                           Check.in.wait.time 
##                                            1 
##                Courtesy.of.of.check.in.staff 
##                                            1 
##             Wait.time.at.passport.inspection 
##                                            1 
##                 Courtesy.of.inspection.staff 
##                                            1 
##                   Courtesy.of.security.staff 
##                                            1 
##          Thoroughness.of.security.inspection 
##                                            1 
##             Wait.time.of.security.inspection 
##                                            1 
##               Feeling.of.safety.and.security 
##                                            1 
## Ease.of.finding.your.way.through.the.airport 
##                                            1 
##                   Flight.information.screens 
##                                            1 
##             Walking.distance.inside.terminal 
##                                            1 
##                   Ease.of.making.connections 
##                                            1 
##                    Courtesy.of.airport.staff 
##                                            1 
##                                  Restaurants 
##                                            1 
##                Restaurants..value.for.money. 
##                                            1 
##     Availability.of.banks.ATM.money.changing 
##                                            1 
##                          Shopping.facilities 
##                                            1 
##        Shopping.facilities..value.for.money. 
##                                            1 
##                              Internet.access 
##                                            1 
##                   Business.executive.lounges 
##                                            1 
##                    Availability.of.washrooms 
##                                            1 
##                     Cleanliness.of.washrooms 
##                                            1 
##                Comfort.of.waiting.gate.areas 
##                                            1 
##              Cleanliness.of.airport.terminal 
##                                            1 
##                          Ambience.of.airport 
##                                            1 
##        Arrivals.passport.and.visa.inspection 
##                                            1 
##                    Speed.of.baggage.delivery 
##                                            1 
##                           Customs.inspection 
##                                            1
my.pca <- prcomp(sur_sc[,1:33])
summary(my.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4    PC5     PC6
## Standard deviation     2.4658 1.8448 1.64178 1.40541 1.3679 1.31664
## Proportion of Variance 0.1842 0.1031 0.08168 0.05985 0.0567 0.05253
## Cumulative Proportion  0.1842 0.2874 0.36906 0.42892 0.4856 0.53815
##                            PC7     PC8     PC9    PC10   PC11    PC12
## Standard deviation     1.19923 1.15935 1.11358 1.01213 0.9950 0.95811
## Proportion of Variance 0.04358 0.04073 0.03758 0.03104 0.0300 0.02782
## Cumulative Proportion  0.58173 0.62246 0.66004 0.69108 0.7211 0.74890
##                           PC13   PC14   PC15    PC16    PC17    PC18
## Standard deviation     0.91171 0.8731 0.8423 0.79820 0.77573 0.73189
## Proportion of Variance 0.02519 0.0231 0.0215 0.01931 0.01824 0.01623
## Cumulative Proportion  0.77408 0.7972 0.8187 0.83799 0.85623 0.87246
##                           PC19    PC20    PC21    PC22    PC23    PC24
## Standard deviation     0.70936 0.70165 0.68390 0.64808 0.60025 0.57900
## Proportion of Variance 0.01525 0.01492 0.01417 0.01273 0.01092 0.01016
## Cumulative Proportion  0.88771 0.90263 0.91680 0.92953 0.94044 0.95060
##                           PC25    PC26    PC27    PC28    PC29    PC30
## Standard deviation     0.56753 0.51751 0.47409 0.43400 0.41887 0.36757
## Proportion of Variance 0.00976 0.00812 0.00681 0.00571 0.00532 0.00409
## Cumulative Proportion  0.96036 0.96848 0.97529 0.98100 0.98631 0.99041
##                           PC31    PC32    PC33
## Standard deviation     0.35444 0.32215 0.29515
## Proportion of Variance 0.00381 0.00314 0.00264
## Cumulative Proportion  0.99422 0.99736 1.00000
library(factoextra)
## Loading required package: ggplot2
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
sur.agg <- aggregate(sur_sc[, 1:33], list(sur_sc$Overall.satisfaction), mean)
sur.agg
##   Group.1 Ground.transportation.to.from.airport Parking.facilities
## 1       0                           0.013166801        0.001845229
## 2       1                          -0.543213638       -0.611255614
## 3       2                          -0.039547660       -0.235754401
## 4       3                          -0.081918475       -0.082831733
## 5       4                          -0.023278729       -0.021771764
## 6       5                          -0.003085792        0.031398512
##   Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                          -0.00804023                  -0.007157468
## 2                          -0.59951807                   1.673061392
## 3                          -0.07313439                   0.214722884
## 4                          -0.05392968                  -0.026380152
## 5                          -0.03285758                  -0.109283853
## 6                           0.06063308                   0.104605713
##   Efficiency.of.check.in.staff Check.in.wait.time
## 1                  -0.03562549        -0.02643162
## 2                  -0.45338318         0.12254995
## 3                   0.01363379        -0.10899697
## 4                  -0.39674866        -0.43471783
## 5                  -0.12621469        -0.11175348
## 6                   0.27571156         0.24618356
##   Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                   -0.01692423                      -0.06188028
## 2                    0.12717674                       0.32881176
## 3                   -0.09678476                       0.17574268
## 4                   -0.43272701                      -0.28422610
## 5                   -0.16405525                       0.03257917
## 6                    0.25863929                       0.20396051
##   Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                  -0.04540544                -0.02816605
## 2                   0.28224017                 0.02999378
## 3                  -0.03192928                -0.60061024
## 4                  -0.25309831                -0.60270179
## 5                  -0.02655886                -0.16506485
## 6                   0.19929114                 0.33127246
##   Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                         -0.01993093                      -0.04758989
## 2                         -0.06672955                      -0.01418078
## 3                         -0.53920229                      -0.78714605
## 4                         -0.61324653                      -0.61986252
## 5                         -0.16047414                      -0.10761615
## 6                          0.30527869                       0.34879583
##   Feeling.of.safety.and.security
## 1                    -0.02682494
## 2                    -0.15948523
## 3                    -0.32569836
## 4                    -0.58742202
## 5                    -0.17775041
## 6                     0.33069070
##   Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                 0.0003949052               -0.008624559
## 2                                -4.0844841290               -0.173149770
## 3                                -1.4095195728               -0.620775600
## 4                                -0.7863448319               -0.418119130
## 5                                -0.1842929342               -0.115761843
## 6                                 0.3159403146                0.202509814
##   Walking.distance.inside.terminal Ease.of.making.connections
## 1                      -0.01872067                -0.11990041
## 2                      -0.44235061                -0.28523955
## 3                      -1.54389667                -0.03892456
## 4                      -0.83693427                 0.14979439
## 5                      -0.15990291                 0.08017280
## 6                       0.35849051                 0.25593342
##   Courtesy.of.airport.staff   Restaurants Restaurants..value.for.money.
## 1               0.009178752 -0.0001026923                   0.006636042
## 2              -1.385210770  0.5268619239                   0.799912152
## 3              -0.045565117 -0.4930049570                  -0.356097165
## 4              -0.493446649 -0.2228163430                  -0.247644266
## 5              -0.213388858 -0.1166255128                  -0.125621210
## 6               0.234490933  0.1388220499                   0.128746239
##   Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                             0.0164804314          0.01111814
## 2                             1.2487263830          0.55683549
## 3                             0.2428184378         -0.18952402
## 4                            -0.0406727390         -0.16353140
## 5                            -0.0593224102         -0.13302306
## 6                             0.0008615471          0.10360067
##   Shopping.facilities..value.for.money. Internet.access
## 1                            0.01352222    0.0006782325
## 2                            0.81865530    1.0757406225
## 3                           -0.13147311    0.0012813531
## 4                           -0.15733262   -0.2110140818
## 5                           -0.10874864   -0.0543813720
## 6                            0.07556460    0.0784988408
##   Business.executive.lounges Availability.of.washrooms
## 1                0.010035796               -0.01217199
## 2                1.900912708               -0.63086746
## 3                0.385829909               -1.04936048
## 4                0.204924500               -0.55799554
## 5               -0.091448921               -0.17098502
## 6               -0.005262893                0.28859386
##   Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1              -0.02129018                   0.004792325
## 2              -0.52448293                  -0.975717379
## 3              -1.04919505                  -1.561147806
## 4              -0.49022001                  -0.982998852
## 5              -0.13767591                  -0.312729929
## 6               0.27612981                   0.437485727
##   Cleanliness.of.airport.terminal Ambience.of.airport
## 1                    -0.008228918         -0.01580088
## 2                    -1.644644511         -1.41109555
## 3                    -1.405638618         -1.75322562
## 4                    -1.127392952         -1.13024252
## 5                    -0.300783538         -0.38136343
## 6                     0.491872099          0.58103376
##   Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                             0.7515050                -0.5028502
## 2                            -1.2237881                 1.8107023
## 3                            -0.8585528                 0.5281696
## 4                            -1.0091442                 0.5098974
## 5                            -1.0941831                 0.5849718
## 6                            -1.1140108                 0.8896601
##   Customs.inspection
## 1          0.4282349
## 2         -0.6954321
## 3         -0.4355155
## 4         -0.6023276
## 5         -0.6364034
## 6         -0.6204349
sur.mean.sc <- data.frame(scale(sur.agg[,2:33], center = TRUE, scale = TRUE))
print(sur.mean.sc)
##   Ground.transportation.to.from.airport Parking.facilities
## 1                             0.5913909          0.6356218
## 2                            -2.0169942         -1.8800852
## 3                             0.3442586         -0.3393093
## 4                             0.1456186          0.2881709
## 5                             0.4205295          0.5387154
## 6                             0.5151966          0.7568864
##   Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1                            0.4563933                    -0.4651890
## 2                           -2.0028623                     2.0128470
## 3                            0.1857438                    -0.1379534
## 4                            0.2655935                    -0.4935392
## 5                            0.3532074                    -0.6158080
## 6                            0.7419243                    -0.3003574
##   Efficiency.of.check.in.staff Check.in.wait.time
## 1                   0.31205617          0.1101886
## 2                  -1.22503081          0.7473897
## 3                   0.49329951         -0.2429472
## 4                  -1.01665120         -1.6360705
## 5                  -0.02125545         -0.2547370
## 6                   1.45758177          1.2761764
##   Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1                     0.1545975                       -0.5825872
## 2                     0.7536472                        1.1996489
## 3                    -0.1773950                        0.5013873
## 4                    -1.5739582                       -1.5968713
## 5                    -0.4570487                       -0.1516876
## 6                     1.3001571                        0.6301098
##   Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1                   -0.3442981                 0.39035044
## 2                    1.3607252                 0.54759318
## 3                   -0.2741700                -1.15732751
## 4                   -1.4251036                -1.16298231
## 5                   -0.2462231                 0.02022659
## 6                    0.9290697                 1.36213961
##   Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1                          0.47250980                        0.3724913
## 2                          0.33639177                        0.4517512
## 3                         -1.03783786                       -1.3820350
## 4                         -1.25320221                       -0.9851708
## 5                          0.06372716                        0.2300847
## 6                          1.41841133                        1.3128785
##   Feeling.of.safety.and.security
## 1                    0.427505956
## 2                   -0.005671361
## 3                   -0.548409218
## 4                   -1.403018828
## 5                   -0.065312884
## 6                    1.594906335
##   Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1                                    0.6327953                 0.61477584
## 2                                   -1.8887736                 0.05398164
## 3                                   -0.2375357                -1.47177824
## 4                                    0.1471460                -0.78101133
## 5                                    0.5187887                 0.24959189
## 6                                    0.8275793                 1.33444020
##   Walking.distance.inside.terminal Ease.of.making.connections
## 1                      0.626099413                 -0.6485807
## 2                     -0.002668923                 -1.4938016
## 3                     -1.637627203                 -0.2346287
## 4                     -0.588325620                  0.7301108
## 5                      0.416551149                  0.3742022
## 6                      1.185971184                  1.2726980
##   Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1                 0.5625514  0.08008581                   -0.06701729
## 2                -1.8522564  1.60318522                    1.85320820
## 3                 0.4677458 -1.34456231                   -0.94505905
## 4                -0.3078967 -0.56362919                   -0.68253504
## 5                 0.1771082 -0.25670310                   -0.38716275
## 6                 0.9527477  0.48162357                    0.22856594
##   Availability.of.banks.ATM.money.changing Shopping.facilities
## 1                              -0.42936481         -0.07037704
## 2                               1.99389957          1.86985781
## 3                               0.01573856         -0.78373701
## 4                              -0.54175896         -0.69132326
## 5                              -0.57843434         -0.58285438
## 6                              -0.46008001          0.25843389
##   Shopping.facilities..value.for.money. Internet.access
## 1                           -0.19297586      -0.3181960
## 2                            1.97977281       1.9964587
## 3                           -0.58426322      -0.3168975
## 4                           -0.65404823      -0.7739785
## 5                           -0.52293851      -0.4367417
## 6                           -0.02554699      -0.1506450
##   Business.executive.lounges Availability.of.washrooms
## 1                -0.51772901                 0.7111755
## 2                 1.98731633                -0.5705337
## 3                -0.01987466                -1.4374971
## 4                -0.25953926                -0.4195699
## 5                -0.65217659                 0.3821734
## 6                -0.53799681                 1.3342516
##   Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1                0.6522375                     0.7713291
## 2               -0.4303433                    -0.5558663
## 3               -1.5592213                    -1.3482915
## 4               -0.3566292                    -0.5657224
## 5                0.4018425                     0.3415382
## 6                1.2921138                     1.3570129
##   Cleanliness.of.airport.terminal Ambience.of.airport
## 1                       0.7723444           0.7474366
## 2                      -1.1496854          -0.8107155
## 3                      -0.8689642          -1.1927787
## 4                      -0.5421545          -0.4970816
## 5                       0.4287284           0.3392059
## 6                       1.3597313           1.4139332
##   Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1                             2.0141409               -1.53070971
## 2                            -0.6214530                1.57682828
## 3                            -0.1341269               -0.14585541
## 4                            -0.3350580               -0.17039839
## 5                            -0.4485236               -0.06955939
## 6                            -0.4749794                0.33969463
mean.pca <- prcomp(sur.mean.sc)
summary(mean.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5       PC6
## Standard deviation     3.9059 3.5682 1.50550 1.27027 0.36331 6.919e-16
## Proportion of Variance 0.4768 0.3979 0.07083 0.05042 0.00412 0.000e+00
## Cumulative Proportion  0.4768 0.8746 0.94545 0.99588 1.00000 1.000e+00
fviz_eig(mean.pca, type=c("barplot", "lines"))

screeplot(mean.pca, type='line')

biplot(mean.pca, col = 'purple', cex=0.5, expand=1)

summary(mean.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5       PC6
## Standard deviation     3.9059 3.5682 1.50550 1.27027 0.36331 6.919e-16
## Proportion of Variance 0.4768 0.3979 0.07083 0.05042 0.00412 0.000e+00
## Cumulative Proportion  0.4768 0.8746 0.94545 0.99588 1.00000 1.000e+00
fviz_eig(mean.pca, type=c("barplot", "lines"))

biplot(mean.pca, col = 'purple', cex=0.5, expand=1)

The Scree plot shows 2 components are enough to explain 87% variance of the data but since the features are highly associated we want to dig further in order to asses the real drivers impacting the Overall Satisfaction and provide recommendation

Modeling

We tired 3 methods of understanding what impacts the overall satisfactions of the passengers + Linear Modeling + Relative Importance + Random Forest for feature importance

All these show us the same kind of results and hence make us sure about our recommendations.

Linear Modeling using imputed data

set.seed(27705)
drop <- c("Date.recorded")
survey_without_date <- survey_imputed[,!(names(survey_imputed) %in% drop)]
survey_without_date[3:36]<-data.frame(scale(survey_without_date[3:36]))
model <- lm(formula= Overall.satisfaction~0+. , data=survey_without_date[3:36])
#head(survey_without_date[3:36])
summary(model)
## 
## Call:
## lm(formula = Overall.satisfaction ~ 0 + ., data = survey_without_date[3:36])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.34012 -0.16777 -0.04877  0.10972  2.50513 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## Ground.transportation.to.from.airport         0.0010338  0.0068947   0.150
## Parking.facilities                            0.0058597  0.0156369   0.375
## Parking.facilities..value.for.money.         -0.0041471  0.0156867  -0.264
## Availability.of.baggage.carts                -0.0012865  0.0076643  -0.168
## Efficiency.of.check.in.staff                  0.0156076  0.0123877   1.260
## Check.in.wait.time                            0.0040762  0.0141078   0.289
## Courtesy.of.of.check.in.staff                -0.0088048  0.0145479  -0.605
## Wait.time.at.passport.inspection              0.0114993  0.0142199   0.809
## Courtesy.of.inspection.staff                  0.0033963  0.0143566   0.237
## Courtesy.of.security.staff                    0.0167158  0.0093947   1.779
## Thoroughness.of.security.inspection          -0.0068994  0.0107506  -0.642
## Wait.time.of.security.inspection              0.0117928  0.0102965   1.145
## Feeling.of.safety.and.security                0.0194412  0.0096302   2.019
## Ease.of.finding.your.way.through.the.airport  0.0050328  0.0080628   0.624
## Flight.information.screens                   -0.0013889  0.0072563  -0.191
## Walking.distance.inside.terminal              0.0283690  0.0080395   3.529
## Ease.of.making.connections                    0.0314559  0.0069883   4.501
## Courtesy.of.airport.staff                     0.0138858  0.0071778   1.935
## Restaurants                                  -0.0069938  0.0117292  -0.596
## Restaurants..value.for.money.                 0.0008908  0.0118580   0.075
## Availability.of.banks.ATM.money.changing     -0.0031834  0.0080316  -0.396
## Shopping.facilities                           0.0049635  0.0123461   0.402
## Shopping.facilities..value.for.money.        -0.0003346  0.0127314  -0.026
## Internet.access                               0.0043269  0.0068896   0.628
## Business.executive.lounges                   -0.0079730  0.0077794  -1.025
## Availability.of.washrooms                     0.0048031  0.0103291   0.465
## Cleanliness.of.washrooms                      0.0107972  0.0104092   1.037
## Comfort.of.waiting.gate.areas                 0.0199827  0.0083003   2.407
## Cleanliness.of.airport.terminal               0.0398335  0.0090232   4.415
## Ambience.of.airport                           0.1239953  0.0090130  13.757
## Arrivals.passport.and.visa.inspection        -0.8314021  0.0089837 -92.546
## Speed.of.baggage.delivery                     0.1167842  0.0081056  14.408
## Customs.inspection                           -0.0338725  0.0080715  -4.197
##                                              Pr(>|t|)    
## Ground.transportation.to.from.airport        0.880818    
## Parking.facilities                           0.707880    
## Parking.facilities..value.for.money.         0.791510    
## Availability.of.baggage.carts                0.866705    
## Efficiency.of.check.in.staff                 0.207782    
## Check.in.wait.time                           0.772648    
## Courtesy.of.of.check.in.staff                0.545067    
## Wait.time.at.passport.inspection             0.418757    
## Courtesy.of.inspection.staff                 0.813008    
## Courtesy.of.security.staff                   0.075282 .  
## Thoroughness.of.security.inspection          0.521065    
## Wait.time.of.security.inspection             0.252157    
## Feeling.of.safety.and.security               0.043589 *  
## Ease.of.finding.your.way.through.the.airport 0.532538    
## Flight.information.screens                   0.848215    
## Walking.distance.inside.terminal             0.000423 ***
## Ease.of.making.connections                   6.98e-06 ***
## Courtesy.of.airport.staff                    0.053128 .  
## Restaurants                                  0.551031    
## Restaurants..value.for.money.                0.940119    
## Availability.of.banks.ATM.money.changing     0.691860    
## Shopping.facilities                          0.687688    
## Shopping.facilities..value.for.money.        0.979033    
## Internet.access                              0.530025    
## Business.executive.lounges                   0.305484    
## Availability.of.washrooms                    0.641954    
## Cleanliness.of.washrooms                     0.299678    
## Comfort.of.waiting.gate.areas                0.016117 *  
## Cleanliness.of.airport.terminal              1.04e-05 ***
## Ambience.of.airport                           < 2e-16 ***
## Arrivals.passport.and.visa.inspection         < 2e-16 ***
## Speed.of.baggage.delivery                     < 2e-16 ***
## Customs.inspection                           2.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3748 on 3401 degrees of freedom
## Multiple R-squared:  0.8608, Adjusted R-squared:  0.8594 
## F-statistic: 637.3 on 33 and 3401 DF,  p-value: < 2.2e-16

Relative Importance using omitted data

Relative importance the percentage of importnace of the predictors on the target variable.

To find the relative importance of various predictors in the Overall Satisfaction, we need coefficents for the predictors towards the dependent variable i.e. Overall Satisfaction. The coefficients can be sourced from fitting models through a regression. We are using a linear model to obain these coefficients for all variables other than Quarter and Date Recorded, for the reasons mentioned before.

To get relative importance, we need to provide the intercept too. Hence, we have not eliminated the intercept from our model.

set.seed(27705)
model1 <- lm(Overall.satisfaction~.-Date.recorded-Quarter, data=survey_omit)
summary(model1)
## 
## Call:
## lm(formula = Overall.satisfaction ~ . - Date.recorded - Quarter, 
##     data = survey_omit)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0680 -0.3051 -0.0535  0.2705  5.0746 
## 
## Coefficients:
##                                                Estimate Std. Error
## (Intercept)                                   0.9193608  0.1001718
## Departure.timeEarly Morning                  -0.0492286  0.0582559
## Departure.timeEvening                         0.0943968  0.0378137
## Departure.timeMorning                        -0.0384857  0.0327338
## Departure.timeNight                          -0.0491610  0.0627197
## Ground.transportation.to.from.airport        -0.0007028  0.0064613
## Parking.facilities                            0.0053897  0.0185858
## Parking.facilities..value.for.money.          0.0078831  0.0201707
## Availability.of.baggage.carts                -0.0101960  0.0090102
## Efficiency.of.check.in.staff                  0.0203886  0.0148824
## Check.in.wait.time                            0.0227398  0.0168559
## Courtesy.of.of.check.in.staff                -0.0289482  0.0171779
## Wait.time.at.passport.inspection             -0.0072406  0.0154614
## Courtesy.of.inspection.staff                  0.0164229  0.0159676
## Courtesy.of.security.staff                    0.0226648  0.0138249
## Thoroughness.of.security.inspection           0.0055238  0.0177721
## Wait.time.of.security.inspection              0.0233914  0.0165996
## Feeling.of.safety.and.security                0.0060408  0.0170105
## Ease.of.finding.your.way.through.the.airport  0.0005833  0.0203197
## Flight.information.screens                    0.0042216  0.0112018
## Walking.distance.inside.terminal              0.1044387  0.0194173
## Ease.of.making.connections                    0.0456396  0.0121427
## Courtesy.of.airport.staff                     0.0117223  0.0078067
## Restaurants                                   0.0017916  0.0124829
## Restaurants..value.for.money.                -0.0109063  0.0136850
## Availability.of.banks.ATM.money.changing     -0.0124474  0.0097654
## Shopping.facilities                          -0.0002189  0.0131586
## Shopping.facilities..value.for.money.         0.0140442  0.0152849
## Internet.access                               0.0030859  0.0072340
## Business.executive.lounges                   -0.0273259  0.0129352
## Availability.of.washrooms                     0.0133165  0.0152375
## Cleanliness.of.washrooms                      0.0140168  0.0142325
## Comfort.of.waiting.gate.areas                 0.0808887  0.0171749
## Cleanliness.of.airport.terminal               0.1453546  0.0227868
## Ambience.of.airport                           0.2878309  0.0217436
## Arrivals.passport.and.visa.inspection        -0.8848256  0.0083488
## Speed.of.baggage.delivery                     0.1269883  0.0097345
## Customs.inspection                           -0.0264663  0.0087083
##                                               t value Pr(>|t|)    
## (Intercept)                                     9.178  < 2e-16 ***
## Departure.timeEarly Morning                    -0.845 0.398169    
## Departure.timeEvening                           2.496 0.012611 *  
## Departure.timeMorning                          -1.176 0.239819    
## Departure.timeNight                            -0.784 0.433220    
## Ground.transportation.to.from.airport          -0.109 0.913390    
## Parking.facilities                              0.290 0.771849    
## Parking.facilities..value.for.money.            0.391 0.695965    
## Availability.of.baggage.carts                  -1.132 0.257909    
## Efficiency.of.check.in.staff                    1.370 0.170817    
## Check.in.wait.time                              1.349 0.177437    
## Courtesy.of.of.check.in.staff                  -1.685 0.092074 .  
## Wait.time.at.passport.inspection               -0.468 0.639608    
## Courtesy.of.inspection.staff                    1.029 0.303807    
## Courtesy.of.security.staff                      1.639 0.101253    
## Thoroughness.of.security.inspection             0.311 0.755970    
## Wait.time.of.security.inspection                1.409 0.158912    
## Feeling.of.safety.and.security                  0.355 0.722530    
## Ease.of.finding.your.way.through.the.airport    0.029 0.977100    
## Flight.information.screens                      0.377 0.706306    
## Walking.distance.inside.terminal                5.379 8.20e-08 ***
## Ease.of.making.connections                      3.759 0.000175 ***
## Courtesy.of.airport.staff                       1.502 0.133335    
## Restaurants                                     0.144 0.885889    
## Restaurants..value.for.money.                  -0.797 0.425554    
## Availability.of.banks.ATM.money.changing       -1.275 0.202555    
## Shopping.facilities                            -0.017 0.986732    
## Shopping.facilities..value.for.money.           0.919 0.358275    
## Internet.access                                 0.427 0.669714    
## Business.executive.lounges                     -2.113 0.034741 *  
## Availability.of.washrooms                       0.874 0.382241    
## Cleanliness.of.washrooms                        0.985 0.324795    
## Comfort.of.waiting.gate.areas                   4.710 2.62e-06 ***
## Cleanliness.of.airport.terminal                 6.379 2.12e-10 ***
## Ambience.of.airport                            13.238  < 2e-16 ***
## Arrivals.passport.and.visa.inspection        -105.983  < 2e-16 ***
## Speed.of.baggage.delivery                      13.045  < 2e-16 ***
## Customs.inspection                             -3.039 0.002397 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.667 on 2494 degrees of freedom
## Multiple R-squared:  0.9105, Adjusted R-squared:  0.9092 
## F-statistic: 685.7 on 37 and 2494 DF,  p-value: < 2.2e-16

From the model summary, we see that not all predictors have statistically significant relationship in predicting the Overall Satisfaction. Hence, we may not need to use their coefficents to get their relative importance.We train another linear model with the predictors that have statistically significant relation in predicting the Overall Satisfaction. We are not yet looking at the coefficients to eliminate or choose the predictors for this model. Even if their coefficients are lower, that will be reflected in the realtive importance.

The predictors we chose for this model are: Departure.time, Walking.distance.inside.terminal, Ease.of.making.connections, Business.executive.lounges, Comfort.of.waiting.gate.areas, Cleanliness.of.airport.terminal, Ambience.of.airport, Arrivals.passport.and.visa.inspection, Speed.of.baggage.delivery, Customs.inspection

fmla <- as.formula("Overall.satisfaction~Departure.time+Walking.distance.inside.terminal+Ease.of.making.connections+Business.executive.lounges+Comfort.of.waiting.gate.areas+Cleanliness.of.airport.terminal+Ambience.of.airport+Arrivals.passport.and.visa.inspection+Speed.of.baggage.delivery+Customs.inspection")
set.seed(27705)
model2 <- lm(formula=fmla, data=survey_omit)
summary(model2)
## 
## Call:
## lm(formula = fmla, data = survey_omit)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0984 -0.3064 -0.0555  0.2643  5.2093 
## 
## Coefficients:
##                                        Estimate Std. Error  t value
## (Intercept)                            1.067240   0.092293   11.564
## Departure.timeEarly Morning           -0.065630   0.058156   -1.129
## Departure.timeEvening                  0.077955   0.037813    2.062
## Departure.timeMorning                 -0.028875   0.032643   -0.885
## Departure.timeNight                   -0.054591   0.062657   -0.871
## Walking.distance.inside.terminal       0.129388   0.017261    7.496
## Ease.of.making.connections             0.024027   0.011322    2.122
## Business.executive.lounges            -0.027009   0.011279   -2.395
## Comfort.of.waiting.gate.areas          0.106181   0.016202    6.554
## Cleanliness.of.airport.terminal        0.159379   0.022580    7.059
## Ambience.of.airport                    0.301419   0.021572   13.973
## Arrivals.passport.and.visa.inspection -0.885748   0.008371 -105.808
## Speed.of.baggage.delivery              0.135735   0.009565   14.191
## Customs.inspection                    -0.020322   0.008446   -2.406
##                                       Pr(>|t|)    
## (Intercept)                            < 2e-16 ***
## Departure.timeEarly Morning             0.2592    
## Departure.timeEvening                   0.0393 *  
## Departure.timeMorning                   0.3765    
## Departure.timeNight                     0.3837    
## Walking.distance.inside.terminal      9.06e-14 ***
## Ease.of.making.connections              0.0339 *  
## Business.executive.lounges              0.0167 *  
## Comfort.of.waiting.gate.areas         6.78e-11 ***
## Cleanliness.of.airport.terminal       2.17e-12 ***
## Ambience.of.airport                    < 2e-16 ***
## Arrivals.passport.and.visa.inspection  < 2e-16 ***
## Speed.of.baggage.delivery              < 2e-16 ***
## Customs.inspection                      0.0162 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6716 on 2518 degrees of freedom
## Multiple R-squared:  0.9084, Adjusted R-squared:  0.9079 
## F-statistic:  1920 on 13 and 2518 DF,  p-value: < 2.2e-16

We have fit this model with the predictors. As the results show, this model explains almost 91% of the variance of Overall Satisfaction. We can use this limited number of predictors and their coefficients to get the relative importance towards Overall sat.

To get the relative importance, we have used The R package, relaimpo. Ulrike Grömping, who maintains the CRAN Task View for Design of Experiments, has written an R package called relaimpo.

“The R package, relaimpo, implements several reasonable procedures from the statistical literature to assign something that looks like a percent contribution to each correlated predictor.” source: www.r-bloggers.com

calc.relimp() calculates the realtive importance metrics for the linear model. We are using the method lmg which gives us sequential sum of R-squared partitioned by averaging over orders. This is another version of the metric Shapley Value Regression.

library(relaimpo)
## Loading required package: MASS
## Loading required package: boot
## Loading required package: survey
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
## 
##     aml
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
## Loading required package: mitools
## This is the global version of package relaimpo.
## If you are a non-US user, a version with the interesting additional metric pmvd is available
## from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
rel.imp <- calc.relimp(model2, type = c("lmg"), rela = TRUE)
rel.imp
## Response variable: Overall.satisfaction 
## Total response variance: 4.897195 
## Analysis based on 2532 observations 
## 
## 13 Regressors: 
## Some regressors combined in groups: 
##         Group  Departure.time : Departure.timeEarly Morning Departure.timeEvening Departure.timeMorning Departure.timeNight 
## 
##  Relative importance of 10 (groups of) regressors assessed: 
##  Departure.time Walking.distance.inside.terminal Ease.of.making.connections Business.executive.lounges Comfort.of.waiting.gate.areas Cleanliness.of.airport.terminal Ambience.of.airport Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery Customs.inspection 
##  
## Proportion of variance explained by model: 90.84%
## Metrics are normalized to sum to 100% (rela=TRUE). 
## 
## Relative importance metrics: 
## 
##                                                lmg
## Departure.time                        0.0052825848
## Walking.distance.inside.terminal      0.0036392182
## Ease.of.making.connections            0.0066377888
## Business.executive.lounges            0.0009527126
## Comfort.of.waiting.gate.areas         0.0047996977
## Cleanliness.of.airport.terminal       0.0073470857
## Ambience.of.airport                   0.0114385209
## Arrivals.passport.and.visa.inspection 0.6681417591
## Speed.of.baggage.delivery             0.1733133758
## Customs.inspection                    0.1184472564
## 
## Average coefficients for different model sizes: 
## 
##                                            1group      2groups     3groups
## Departure.timeEarly Morning           -0.52860577 -0.447038136 -0.37413052
## Departure.timeEvening                 -0.45040920 -0.373157079 -0.30016956
## Departure.timeMorning                  0.19232677  0.132876462  0.08328254
## Departure.timeNight                    0.03324125  0.007613281 -0.01250115
## Walking.distance.inside.terminal       0.16175345  0.155457726  0.14923250
## Ease.of.making.connections             0.24195928  0.204047918  0.16936312
## Business.executive.lounges            -0.04616400 -0.039864042 -0.03486394
## Comfort.of.waiting.gate.areas          0.17087382  0.163808593  0.15596452
## Cleanliness.of.airport.terminal        0.25375250  0.252416940  0.24905747
## Ambience.of.airport                    0.26342528  0.268934744  0.27302731
## Arrivals.passport.and.visa.inspection -0.92243302 -0.919697368 -0.91688553
## Speed.of.baggage.delivery              0.78366314  0.700634815  0.61799527
## Customs.inspection                    -0.59238255 -0.513579281 -0.43777934
##                                           4groups     5groups      6groups
## Departure.timeEarly Morning           -0.30980087 -0.25358029 -0.204747479
## Departure.timeEvening                 -0.23209408 -0.16916964 -0.111309581
## Departure.timeMorning                  0.04333447  0.01255354 -0.009692403
## Departure.timeNight                   -0.02786588 -0.03922293 -0.047200685
## Walking.distance.inside.terminal       0.14332986  0.13809733  0.133862138
## Ease.of.making.connections             0.13833202  0.11108081  0.087520216
## Business.executive.lounges            -0.03100108 -0.02819378 -0.026381088
## Comfort.of.waiting.gate.areas          0.14757067  0.13897968  0.130590252
## Cleanliness.of.airport.terminal        0.24337208  0.23522000  0.224580358
## Ambience.of.airport                    0.27624553  0.27911402  0.282101698
## Arrivals.passport.and.visa.inspection -0.91374543 -0.91013886 -0.906024936
## Speed.of.baggage.delivery              0.53747739  0.46018876  0.386780227
## Customs.inspection                    -0.36542624 -0.29690348 -0.232520606
##                                           7groups      8groups     9groups
## Departure.timeEarly Morning           -0.16244024 -0.125744174 -0.09375645
## Departure.timeEvening                 -0.05818589 -0.009314789  0.03587020
## Departure.timeMorning                 -0.02412821 -0.031547717 -0.03281461
## Departure.timeNight                   -0.05231162 -0.054987614 -0.05561965
## Walking.distance.inside.terminal       0.13085343  0.129162929  0.12873751
## Ease.of.making.connections             0.06741029  0.050407912  0.03609961
## Business.executive.lounges            -0.02548698 -0.025401536 -0.02597293
## Comfort.of.waiting.gate.areas          0.12279280  0.115937415  0.11032029
## Cleanliness.of.airport.terminal        0.21151930  0.196167719  0.17870967
## Ambience.of.airport                    0.28559429  0.289877517  0.29512945
## Arrivals.passport.and.visa.inspection -0.90143742 -0.896458853 -0.89119444
## Speed.of.baggage.delivery              0.31757958  0.252697621  0.19211221
## Customs.inspection                    -0.17251488 -0.117063170 -0.06629817
##                                          10groups
## Departure.timeEarly Morning           -0.06563006
## Departure.timeEvening                  0.07795468
## Departure.timeMorning                 -0.02887488
## Departure.timeNight                   -0.05459062
## Walking.distance.inside.terminal       0.12938759
## Ease.of.making.connections             0.02402655
## Business.executive.lounges            -0.02700927
## Comfort.of.waiting.gate.areas          0.10618082
## Cleanliness.of.airport.terminal        0.15937904
## Ambience.of.airport                    0.30141892
## Arrivals.passport.and.visa.inspection -0.88574785
## Speed.of.baggage.delivery              0.13573491
## Customs.inspection                    -0.02032167
rel.imp$lmg *100
##                        Departure.time 
##                            0.52825848 
##      Walking.distance.inside.terminal 
##                            0.36392182 
##            Ease.of.making.connections 
##                            0.66377888 
##            Business.executive.lounges 
##                            0.09527126 
##         Comfort.of.waiting.gate.areas 
##                            0.47996977 
##       Cleanliness.of.airport.terminal 
##                            0.73470857 
##                   Ambience.of.airport 
##                            1.14385209 
## Arrivals.passport.and.visa.inspection 
##                           66.81417591 
##             Speed.of.baggage.delivery 
##                           17.33133758 
##                    Customs.inspection 
##                           11.84472564
rel.imp$lmg.rank
##                        Departure.time 
##                                     7 
##      Walking.distance.inside.terminal 
##                                     9 
##            Ease.of.making.connections 
##                                     6 
##            Business.executive.lounges 
##                                    10 
##         Comfort.of.waiting.gate.areas 
##                                     8 
##       Cleanliness.of.airport.terminal 
##                                     5 
##                   Ambience.of.airport 
##                                     4 
## Arrivals.passport.and.visa.inspection 
##                                     1 
##             Speed.of.baggage.delivery 
##                                     2 
##                    Customs.inspection 
##                                     3

Proportion of variance explained by model: 90.84%

As we can see, the % importance is given by the lmg method as :

Departure.time 0.528% Walking.distance.inside.terminal 0.363% Ease.of.making.connections 0.663% Business.executive.lounges 0.095% Comfort.of.waiting.gate.areas 0.479% Cleanliness.of.airport.terminal 0.734% Ambience.of.airport 1.143% Arrivals.passport.and.visa.inspection 66.814% Speed.of.baggage.delivery 17.331% Customs.inspection 11.844%

Random Forest Model

Since the relationship between the drivers and Overall Satisfaction is linear and the responses are highly associated, we considered a more complex model, Random forest.

Preparing data for Random Forest Model

for (i in 4:37) 
  {
  survey_omit[,i] <- as.factor(survey_omit[,i])
  survey_omit[,i] <- as.ordered(survey_omit[,i])
}
survey_omit$Quarter <- NULL
survey_omit$Date.recorded <- NULL

Model

set.seed(27705)
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
rf <- randomForest(Overall.satisfaction ~. , data=survey_omit, importance = TRUE)
rf$importance
##                                                         0 1             2
## Departure.time                               0.0002013969 0  0.0003333333
## Ground.transportation.to.from.airport        0.0004307648 0  0.0000000000
## Parking.facilities                           0.0006022056 0  0.0000000000
## Parking.facilities..value.for.money.         0.0004920788 0  0.0000000000
## Availability.of.baggage.carts                0.0004436029 0  0.0006666667
## Efficiency.of.check.in.staff                 0.0018011572 0  0.0010000000
## Check.in.wait.time                           0.0024575829 0 -0.0016666667
## Courtesy.of.of.check.in.staff                0.0021730526 0  0.0003333333
## Wait.time.at.passport.inspection             0.0030470929 0 -0.0028333333
## Courtesy.of.inspection.staff                 0.0026914811 0 -0.0040000000
## Courtesy.of.security.staff                   0.0013841036 0 -0.0005000000
## Thoroughness.of.security.inspection          0.0018070536 0  0.0040000000
## Wait.time.of.security.inspection             0.0025554609 0  0.0021666667
## Feeling.of.safety.and.security               0.0031344658 0 -0.0036666667
## Ease.of.finding.your.way.through.the.airport 0.0005518231 0  0.0035000000
## Flight.information.screens                   0.0001552117 0  0.0063333333
## Walking.distance.inside.terminal             0.0009524641 0  0.0055000000
## Ease.of.making.connections                   0.0021159312 0  0.0000000000
## Courtesy.of.airport.staff                    0.0002008149 0 -0.0120000000
## Restaurants                                  0.0015366817 0  0.0000000000
## Restaurants..value.for.money.                0.0015769618 0  0.0000000000
## Availability.of.banks.ATM.money.changing     0.0002464875 0 -0.0030000000
## Shopping.facilities                          0.0014366192 0  0.0003333333
## Shopping.facilities..value.for.money.        0.0022882329 0  0.0020000000
## Internet.access                              0.0002469587 0  0.0010000000
## Business.executive.lounges                   0.0004486848 0  0.0000000000
## Availability.of.washrooms                    0.0020751401 0 -0.0025000000
## Cleanliness.of.washrooms                     0.0030276999 0  0.0086666667
## Comfort.of.waiting.gate.areas                0.0036870193 0  0.0128333333
## Cleanliness.of.airport.terminal              0.0035411703 0  0.0131666667
## Ambience.of.airport                          0.0041679819 0  0.0128333333
## Arrivals.passport.and.visa.inspection        0.2493411142 0  0.0090000000
## Speed.of.baggage.delivery                    0.1081883903 0 -0.0001666667
## Customs.inspection                           0.0141381323 0  0.0050000000
##                                                         3             4
## Departure.time                               0.0037562050  2.325007e-03
## Ground.transportation.to.from.airport        0.0014373995  2.107434e-03
## Parking.facilities                           0.0010595075  1.671025e-04
## Parking.facilities..value.for.money.         0.0011751197  3.233654e-04
## Availability.of.baggage.carts                0.0053944849  1.992465e-03
## Efficiency.of.check.in.staff                 0.0016133008  2.530866e-03
## Check.in.wait.time                           0.0109787108  6.456282e-04
## Courtesy.of.of.check.in.staff                0.0030188454  2.809828e-03
## Wait.time.at.passport.inspection             0.0222749697 -1.450338e-03
## Courtesy.of.inspection.staff                 0.0098526119 -3.479993e-04
## Courtesy.of.security.staff                   0.0266248240  2.381683e-03
## Thoroughness.of.security.inspection          0.0415393889  5.639152e-03
## Wait.time.of.security.inspection             0.0235714856 -3.537745e-03
## Feeling.of.safety.and.security               0.0262847309  2.593797e-03
## Ease.of.finding.your.way.through.the.airport 0.0288575981 -3.185728e-05
## Flight.information.screens                   0.0201140806  1.562988e-03
## Walking.distance.inside.terminal             0.0515735882 -2.951345e-03
## Ease.of.making.connections                   0.0009568472  1.089356e-04
## Courtesy.of.airport.staff                    0.0508051148  9.124264e-04
## Restaurants                                  0.0110669776  1.966803e-03
## Restaurants..value.for.money.                0.0063810231  8.454966e-04
## Availability.of.banks.ATM.money.changing     0.0009022420  2.775898e-03
## Shopping.facilities                          0.0071184898  5.115951e-03
## Shopping.facilities..value.for.money.        0.0029911285  3.762203e-03
## Internet.access                              0.0044698136  2.421927e-04
## Business.executive.lounges                   0.0005990089  4.825242e-04
## Availability.of.washrooms                    0.0680645727  1.967591e-02
## Cleanliness.of.washrooms                     0.0551669120  1.281529e-02
## Comfort.of.waiting.gate.areas                0.0994293842  6.625082e-03
## Cleanliness.of.airport.terminal              0.1245772144  2.715951e-02
## Ambience.of.airport                          0.1268695559  5.109896e-02
## Arrivals.passport.and.visa.inspection        0.1567563324  2.266520e-01
## Speed.of.baggage.delivery                    0.0118883193  1.725672e-02
## Customs.inspection                           0.0347743174  4.938032e-02
##                                                         5
## Departure.time                               0.0046008193
## Ground.transportation.to.from.airport        0.0028761589
## Parking.facilities                           0.0013893261
## Parking.facilities..value.for.money.         0.0006685922
## Availability.of.baggage.carts                0.0006336127
## Efficiency.of.check.in.staff                 0.0092048049
## Check.in.wait.time                           0.0079923772
## Courtesy.of.of.check.in.staff                0.0108037042
## Wait.time.at.passport.inspection             0.0053886653
## Courtesy.of.inspection.staff                 0.0079362803
## Courtesy.of.security.staff                   0.0144159514
## Thoroughness.of.security.inspection          0.0181849071
## Wait.time.of.security.inspection             0.0231823567
## Feeling.of.safety.and.security               0.0183176576
## Ease.of.finding.your.way.through.the.airport 0.0070239251
## Flight.information.screens                   0.0024934103
## Walking.distance.inside.terminal             0.0100574775
## Ease.of.making.connections                   0.0011413432
## Courtesy.of.airport.staff                    0.0042437332
## Restaurants                                  0.0092830426
## Restaurants..value.for.money.                0.0051038297
## Availability.of.banks.ATM.money.changing     0.0017795486
## Shopping.facilities                          0.0056840678
## Shopping.facilities..value.for.money.        0.0057866611
## Internet.access                              0.0035114625
## Business.executive.lounges                   0.0010764334
## Availability.of.washrooms                    0.0114800751
## Cleanliness.of.washrooms                     0.0142066094
## Comfort.of.waiting.gate.areas                0.0321677740
## Cleanliness.of.airport.terminal              0.0475761802
## Ambience.of.airport                          0.0649759003
## Arrivals.passport.and.visa.inspection        0.2752243065
## Speed.of.baggage.delivery                    0.0203153727
## Customs.inspection                           0.0892569571
##                                              MeanDecreaseAccuracy
## Departure.time                                       0.0015555333
## Ground.transportation.to.from.airport                0.0012475202
## Parking.facilities                                   0.0007016814
## Parking.facilities..value.for.money.                 0.0005140451
## Availability.of.baggage.carts                        0.0009029363
## Efficiency.of.check.in.staff                         0.0034614861
## Check.in.wait.time                                   0.0036024042
## Courtesy.of.of.check.in.staff                        0.0041027118
## Wait.time.at.passport.inspection                     0.0033867193
## Courtesy.of.inspection.staff                         0.0034916113
## Courtesy.of.security.staff                           0.0050736908
## Thoroughness.of.security.inspection                  0.0071481220
## Wait.time.of.security.inspection                     0.0065729533
## Feeling.of.safety.and.security                       0.0069724707
## Ease.of.finding.your.way.through.the.airport         0.0027271365
## Flight.information.screens                           0.0015144143
## Walking.distance.inside.terminal                     0.0038719606
## Ease.of.making.connections                           0.0015428921
## Courtesy.of.airport.staff                            0.0027390682
## Restaurants                                          0.0035212939
## Restaurants..value.for.money.                        0.0023493875
## Availability.of.banks.ATM.money.changing             0.0009874546
## Shopping.facilities                                  0.0030877108
## Shopping.facilities..value.for.money.                0.0032608416
## Internet.access                                      0.0010906567
## Business.executive.lounges                           0.0005934971
## Availability.of.washrooms                            0.0089789033
## Cleanliness.of.washrooms                             0.0086156480
## Comfort.of.waiting.gate.areas                        0.0131939613
## Cleanliness.of.airport.terminal                      0.0205033254
## Ambience.of.airport                                  0.0284659665
## Arrivals.passport.and.visa.inspection                0.2473645655
## Speed.of.baggage.delivery                            0.0717289045
## Customs.inspection                                   0.0361283974
##                                              MeanDecreaseGini
## Departure.time                                      27.578207
## Ground.transportation.to.from.airport               17.426113
## Parking.facilities                                  11.319293
## Parking.facilities..value.for.money.                12.427110
## Availability.of.baggage.carts                       11.794447
## Efficiency.of.check.in.staff                        14.997081
## Check.in.wait.time                                  13.846835
## Courtesy.of.of.check.in.staff                       14.879923
## Wait.time.at.passport.inspection                    15.864762
## Courtesy.of.inspection.staff                        14.349553
## Courtesy.of.security.staff                          17.744886
## Thoroughness.of.security.inspection                 20.598643
## Wait.time.of.security.inspection                    19.553825
## Feeling.of.safety.and.security                      19.208359
## Ease.of.finding.your.way.through.the.airport        14.176076
## Flight.information.screens                          13.758136
## Walking.distance.inside.terminal                    18.409828
## Ease.of.making.connections                           9.635924
## Courtesy.of.airport.staff                           20.459790
## Restaurants                                         20.733673
## Restaurants..value.for.money.                       22.008529
## Availability.of.banks.ATM.money.changing            11.687861
## Shopping.facilities                                 17.636889
## Shopping.facilities..value.for.money.               18.337668
## Internet.access                                     21.788125
## Business.executive.lounges                           7.123887
## Availability.of.washrooms                           25.429946
## Cleanliness.of.washrooms                            26.601844
## Comfort.of.waiting.gate.areas                       34.788871
## Cleanliness.of.airport.terminal                     46.902134
## Ambience.of.airport                                 56.302689
## Arrivals.passport.and.visa.inspection              552.744328
## Speed.of.baggage.delivery                          169.216008
## Customs.inspection                                 102.332635

In the model, the Mean Decrease Accuracy and Mean Decrease Gini explain the overall coefficients and the relative importance. This can be better explained by using the measure_importance() function in the randomForestExplainer library. randomForestExplainer contains a set of tools to help explain the most important variables in a ranfom forest.

library(randomForestExplainer)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
importance_rf <- measure_importance(rf)
min_depth_frame <- min_depth_distribution(rf)
save(min_depth_frame, file = "min_depth_frame.rda")
load("min_depth_frame.rda")
head(min_depth_frame, n = 10)
##    tree                                 variable minimal_depth
## 1     1                      Ambience.of.airport             1
## 2     1    Arrivals.passport.and.visa.inspection             2
## 3     1            Availability.of.baggage.carts             5
## 4     1 Availability.of.banks.ATM.money.changing             4
## 5     1                Availability.of.washrooms             6
## 6     1               Business.executive.lounges             3
## 7     1                       Check.in.wait.time             6
## 8     1          Cleanliness.of.airport.terminal             1
## 9     1                 Cleanliness.of.washrooms             2
## 10    1            Comfort.of.waiting.gate.areas             3
plot_min_depth_distribution(min_depth_frame)

plot_min_depth_distribution(min_depth_frame, mean_sample = "relevant_trees", k = 15)

importance_rf
##                                        variable mean_min_depth no_of_nodes
## 1                           Ambience.of.airport       2.674000        4940
## 2         Arrivals.passport.and.visa.inspection       2.078000        7175
## 3                 Availability.of.baggage.carts       4.958000        3480
## 4      Availability.of.banks.ATM.money.changing       4.940000        3425
## 5                     Availability.of.washrooms       3.976000        4794
## 6                    Business.executive.lounges       5.353864        2010
## 7                            Check.in.wait.time       4.900000        3930
## 8               Cleanliness.of.airport.terminal       2.890000        3893
## 9                      Cleanliness.of.washrooms       3.732000        5035
## 10                Comfort.of.waiting.gate.areas       3.226000        5280
## 11                    Courtesy.of.airport.staff       4.462000        4611
## 12                 Courtesy.of.inspection.staff       4.718000        4100
## 13                Courtesy.of.of.check.in.staff       4.568000        3690
## 14                   Courtesy.of.security.staff       4.174000        3917
## 15                           Customs.inspection       2.945864        2534
## 16                               Departure.time       3.652000        6803
## 17 Ease.of.finding.your.way.through.the.airport       4.368000        3523
## 18                   Ease.of.making.connections       4.913864        2142
## 19                 Efficiency.of.check.in.staff       4.606000        3768
## 20               Feeling.of.safety.and.security       4.028000        3717
## 21                   Flight.information.screens       5.096644        3754
## 22        Ground.transportation.to.from.airport       4.924000        5444
## 23                              Internet.access       4.602000        6439
## 24                           Parking.facilities       5.106000        3627
## 25         Parking.facilities..value.for.money.       4.898000        3851
## 26                                  Restaurants       4.648000        6161
## 27                Restaurants..value.for.money.       4.638000        6688
## 28                          Shopping.facilities       4.554000        5208
## 29        Shopping.facilities..value.for.money.       4.464000        5477
## 30                    Speed.of.baggage.delivery       2.224000        6069
## 31          Thoroughness.of.security.inspection       4.046644        3969
## 32             Wait.time.at.passport.inspection       4.494000        4451
## 33             Wait.time.of.security.inspection       3.952000        4297
## 34             Walking.distance.inside.terminal       4.290000        4093
##    accuracy_decrease gini_decrease no_of_trees times_a_root       p_value
## 1       0.0284659665     56.302689         500           53  3.182806e-12
## 2       0.2473645655    552.744328         500           74 9.626574e-311
## 3       0.0009029363     11.794447         500            0  1.000000e+00
## 4       0.0009874546     11.687861         500            0  1.000000e+00
## 5       0.0089789033     25.429946         500           23  1.204825e-06
## 6       0.0005934971      7.123887         494            0  1.000000e+00
## 7       0.0036024042     13.846835         500            2  1.000000e+00
## 8       0.0205033254     46.902134         500           50  1.000000e+00
## 9       0.0086156480     26.601844         500           26  7.120511e-17
## 10      0.0131939613     34.788871         500           35  1.683698e-32
## 11      0.0027390682     20.459790         500           14  2.366094e-02
## 12      0.0034916113     14.349553         500            3  1.000000e+00
## 13      0.0041027118     14.879923         500            7  1.000000e+00
## 14      0.0050736908     17.744886         500           19  1.000000e+00
## 15      0.0361283974    102.332635         494           50  1.000000e+00
## 16      0.0015555333     27.578207         500            0 4.537606e-236
## 17      0.0027271365     14.176076         500            4  1.000000e+00
## 18      0.0015428921      9.635924         494            2  1.000000e+00
## 19      0.0034614861     14.997081         500            9  1.000000e+00
## 20      0.0069724707     19.208359         500           15  1.000000e+00
## 21      0.0015144143     13.758136         499            3  1.000000e+00
## 22      0.0012475202     17.426113         500            0  8.271402e-46
## 23      0.0010906567     21.788125         500            0 6.060606e-172
## 24      0.0007016814     11.319293         500            0  1.000000e+00
## 25      0.0005140451     12.427110         500            1  1.000000e+00
## 26      0.0035212939     20.733673         500            1 3.210666e-129
## 27      0.0023493875     22.008529         500            0 7.831645e-215
## 28      0.0030877108     17.636889         500            2  2.260923e-27
## 29      0.0032608416     18.337668         500            0  9.336441e-49
## 30      0.0717289045    169.216008         500           63 2.565869e-116
## 31      0.0071481220     20.598643         499           15  1.000000e+00
## 32      0.0033867193     15.864762         500            4  6.679741e-01
## 33      0.0065729533     19.553825         500           18  9.973488e-01
## 34      0.0038719606     18.409828         500            7  1.000000e+00

This shows the most important variables and their p-values. We can get the top 10 variables using the important_variables() function.

(vars <- important_variables(importance_rf, k = 10, measures = c("mean_min_depth", "no_of_trees")))
##  [1] "Arrivals.passport.and.visa.inspection"
##  [2] "Speed.of.baggage.delivery"            
##  [3] "Ambience.of.airport"                  
##  [4] "Cleanliness.of.airport.terminal"      
##  [5] "Comfort.of.waiting.gate.areas"        
##  [6] "Customs.inspection"                   
##  [7] "Departure.time"                       
##  [8] "Cleanliness.of.washrooms"             
##  [9] "Wait.time.of.security.inspection"     
## [10] "Availability.of.washrooms"

We have the top 10 features which impact the Overall Satisfaction for the customers. This is based on the min depth of the trees and the occurence of the varibales in maximum number of trees. This is a very robust model to get the most important variables.

hist(survey.df$Arrivals.passport.and.visa.inspection, main="Arrivals Passport & Visa Inspection")

hist(survey.df$Speed.of.baggage.delivery)

hist(survey.df$Ambience.of.airport)

hist(survey.df$Cleanliness.of.airport.terminal)

hist(survey.df$Comfort.of.waiting.gate.areas)

hist(survey.df$Customs.inspection)

hist(survey.df$Cleanliness.of.washrooms)

hist(survey.df$Wait.time.of.security.inspection)

hist(survey.df$Availability.of.washrooms)

So, the prime area of focus for the airport right now should be on the Speed of Baggae Delivery. This needs to be focused on to improve the overall satisfaction of the visitors.